Automated hypothesis testing

ABSTRACT

A method of automatically applying a hypothesis test to a data set. The method reduces errors made in failing to appreciate predicate assumptions of various statistical tests, and elicits a series of indications from the user regarding characteristics of interest embodied by the data set to select an appropriate statistical test. The system also reduces errors in constructing competing null and alternative hypothesis statements by generating a characterization of the data and defining null and alternative hypotheses according to the indications, selected statistical test, and conventions adopted with respect to the tests. The system also establishes a significance level, calculates the test statistic, and generates an output. The output of the system provides a plain interpretation of the quantitative results in the terms indicated by the user to reduce errors in interpretation of the conclusion.

RELATED APPLICATION

This application is a continuation of prior filed co-pending U.S. patentapplication Ser. No. 13/279,711, now U.S. patent application Ser. No.8,370,107, filed Oct. 24, 2011, which is a continuation of prior filedco-pending U.S. patent application Ser. No. 12/878,426, now U.S. Pat.No. 8,050,888, filed Sep. 9, 2010, which is a continuation of priorfiled co-pending U.S. patent application Ser. No. 12/785,223, now U.S.Pat. No. 8,046,190, filed on May 21, 2010, which is a continuation ofprior filed co-pending U.S. patent application Ser. No. 11/401,555, nowU.S. Pat. No. 7,725,291, filed on Apr. 11, 2006, the entire content ofeach are hereby incorporated by reference.

FIELD OF THE INVENTION

The invention relates to methods and systems for using statisticalanalysis tools, and more particularly to methods and systems forautomatically constructing and interpreting hypothesis tests.

BACKGROUND

Statistical tests provide a mechanism for making quantitativeconclusions about characteristics or behavior of a process asrepresented by a sample of data drawn from the process. Statisticaltests also are used to compare characteristics or behaviors of two ormore processes based on respective data sets or samples drawn from theprocesses.

The term “hypothesis testing” describes a broad topic within the fieldof statistical analysis. Hypothesis testing entails particularmethodologies using statistical tests to calculate the likely validityof a claim made about a population under study based on observed data.The claim, or theory, to which the statistical testing is applied iscalled a “hypothesis” or “hypothesis statement”, and the data set orsample under study usually represents a sampling of data reflecting aninput to, or output of, a process. A well-constructed hypothesisstatement specifies a certain characteristic or parameter of theprocess. Typical process characteristics used in hypothesis testinginclude statistically meaningful parameters such as the average or meanoutput of a process (sometimes also referred to as the “location” of theprocess) and/or the dispersion/spread or variance of the process.

When constructing a hypothesis test, a hypothesis statement is definedto describe a process condition of interest that, for the purpose of thetest, is alleged to be true. This initial statement is referred to asthe “null hypothesis” and is often denoted algebraically by the symbolH₀. Typically the null hypothesis is a logical statement describing theputative condition of a process in terms of a statistically meaningfulparameter. For example, consider an example of hypothesis testing asapplied to the discharge/output of a wastewater treatment process.Assume there are concerns that the process recently has changed suchthat the output is averaging a higher level of contaminants than thehistorical (and acceptable) output of 5 parts of contaminant per million(ppm). A null hypothesis based on this data could be stated as follows:the level of contaminants in the output of the process has a mean valueequal to or greater than 5 ppm. The null hypothesis is stated in termsof a meaningful statistical parameter, i.e., process mean, and in termsof the process of interest, i.e., the level of contaminants in theprocess output.

Likewise, hypothesis testing also entails constructing an alternativehypothesis statement regarding the process behavior or condition. Forthe purpose of the test, the status of the alternative hypothesisstatement is presumed to be uncertain, and is denoted by the symbol H₁.An alternative hypothesis statement defines an uncertain condition orresult in terms of the same statistical parameter as the nullhypothesis, e.g., process mean, in the case of the wastewater treatmentexample. In that example, an alternative hypothesis statement would bedefined along the following lines: the level of contaminants in theoutput of the process has a mean value of less than 5 ppm. Inconstructing null and alternative hypotheses, it is imperative that thestatements be stated in terms that are mutually exclusive andexhaustive, i.e., such that there is neither overlap in possible resultsnor an unaccounted for or “lurking” hypothesis.

One object in applying hypothesis testing is to see if there issufficient statistical evidence (data) to reject a presumed nullhypothesis H₀ in favor of an alternative hypothesis H₁. Such a rejectionwould be appropriate under circumstances wherein the null hypothesisstatement is inconsistent with the characteristics of the sampled data.In the alternative, in the event the data are not inconsistent with thestatement made by the null hypothesis, then the test result is a failureto reject the null hypothesis—meaning the data sampling and testing doesnot provide a reason to believe any statement other than the nullhypothesis. In short, application of a hypothesis test results in astatistical decision based on sampled data, and results either in arejection of the null hypothesis H₀, which leaves a conclusion in favorof the alternative H₁, or a failure to reject the null hypothesis H₀,which leaves a conclusion wherein the null hypothesis cannot be foundfalse based on the sampled data.

Any Hypothesis Test can be conducted by following the four stepsoutlined below:

Step 1—State the null and alternative hypotheses. This step entailsgenerating a hypothesis of interest that can be tested against analternative hypothesis. The competing statements must be mutuallyexclusive and exhaustive.

Step 2—State the decision criteria. This step entails articulating thefactors upon which a decision to reject or fail to reject the nullhypothesis will be based. Establishing appropriate decision criteriadepends on the nature of the null and alternative hypotheses and theunderlying data. Typical decision criteria include a choice of a teststatistic and significance level (denoted algebraically as “alpha” α) tobe applied to the analysis. Many different test statistics can be usedin hypothesis testing, including use of a standard or test valueassociated with the process data, e.g., the process mean or variance,and/or test values associated with the differences between twoprocesses, e.g., differences between proportions/means/medians, ratiosof variances and the like. The significance level reflects the degree ofconfidence desired when drawing conclusions based on the comparison ofthe test statistic to the reference statistic.

Step 3—Collect data relating to the null hypothesis and calculate thetest statistic. At this step, data is collected through sampling and therelevant test statistic is calculated using the sampled data.

Step 4—State a conclusion. At this step, the appropriate test statisticis compared to its corresponding reference statistic (based on the nulldistribution) which shows how the test statistic would be distributed ifthe null hypothesis were true. Generally speaking, a conclusion can beproperly drawn from the resultant value of the test statistic in one ofseveral different ways: by comparing the test statistic to thepredetermined cut-off values, which were established in Step 2; bycalculating the so-called “p-value” and comparing it to the predeterminesignificance level α alpha; or by computing confidence intervals. Thep-value is quantitative assessment of the probability of observing avalue of the test statistic that is either as extreme as or more extremethan the calculated value of the test statistic, purely by randomchance, under the assumption that the null hypothesis is true.

SUMMARY

There are several different forms of statistical tests that are usefulin hypothesis testing. Those of skill in the art will understand howtests such as t-tests, Z-tests and F-tests can be used for hypothesistesting by way of the above methodology, but each may be appropriateonly if a variety of predicates are found. In particular, theapplicability of a particular test depends on, among other things, thenature of the hypothesis statements, the nature of the data available,and assumptions relating to the distributions and sampling of the data.For example, sometimes the hypotheses under consideration entail acomparison of statistical means, a comparison of variances, or acomparison of proportions. Similarly, the data may be either attributedata or variable/continuous data. With respect to assumptions of thedistributions and sampling of data, different statistical tests areappropriate depending upon, for example, whether the sample sizes arelarge or small, time ordered or not, paired samples or not, or whethervariances of the samples are know or not. The selection of anappropriate test for a particular set of predicates is imperativebecause application of an inappropriate test can result in unfounded orerroneous conclusions, which in turn lead to faulty decisions.

The proper construction of the null and alternative hypotheses alsorequires an understanding of the statistical test and its underlyingassumptions. In addition, it is imperative that the null and alternativehypotheses be constructed so as to be mutually exclusive and exhaustive.Moreover, it is sometimes difficult for the practitioner to construct ameaningful set of competing null and hypothesis statements in terms ofthe process and data of interest.

Interpreting the conclusions of a hypothesis test can also be difficult,even when the appropriate test is selected, an appropriate nullhypothesis is subjected to the test statistic, and the test statistic isaccurately calculated. This difficulty in interpretation can arise ifthe results of the test are not expressed in terms that relate thequantitative analysis to the terms used in describing the process, or ifthe basis for the conclusion is not clear. Indeed, in some cases, theconstruction of a null hypothesis and the associated data analysisresults in an appropriate (but counterintuitive) conclusion that thenull should not be rejected due to the absence of a data-supported basisthat the null hypothesis is false. The logic underlying the conclusionis sound, but often misunderstood.

Hypothesis testing thus has the potential to bring powerful tools tobear on the understanding of complex process behaviors, particularlyprocesses that behave in a manner that is not intuitive. Hypothesistesting brings the power and focus of data-driven analysis to decisionmaking, which sometimes can be lead astray by the complexities of theprocess of interest or biases of the decision maker. However, despitethe power and usefulness of hypothesis testing, it remains a difficulttool to apply. One of the difficulties often encountered in applyinghypothesis tests is the fact that each statistical test depends onmultiple predicates or assumptions for validity. Applying a teststatistic to a data set that does not embody the predicate assumptionscan result in conclusions that are unsupported by the data, yet are notobviously so. Consequently, it is possible to make unfounded decisionsin error.

Another problem with the application of hypothesis testing is thesomewhat counter-intuitive requirement that the null hypothesis bestated and then the conclusion be drawn so as to either reject or failto reject the null hypothesis (rather than merely accepting the nullhypothesis). This difficulty is common for a variety of reasons, amongthem the requirements that the statements be mutually exclusive andexhaustive, the statements be posed in terms of a statisticallymeaningful parameter that is appropriate in view of the process data tobe sampled, and the statements should be stated in terms that willprovide meaningful insight to the process, i.e., will be useful inmaking a decision based on the data.

In this regard, while it is important to state the hypothesis test interms of the problem, it is equally important (and perhaps moreimportant) to interpret the conclusions of the test in practical terms.Whether the test statistic supports either rejection or failure toreject the null hypothesis, the result needs to be correctly stated andunderstood in practical terms so that the results of the test can guidedecisions pertaining to the process or processes.

Accordingly, the invention provides, in one embodiment, a method ofautomatically applying hypothesis testing to a data set. The methodprovides a plurality of statistical tests and, through a series ofqueries and indications, the method assures that multiple predicates orassumptions for validity of each statistical test are affirmativelyconsidered. By confirming the assumptions and providing confirmatorynotifications relating to the implications of the queries andindications, the method assures application of a statistical test to thedata appropriate for the predicate assumptions of the test.

In another embodiment, the invention provides a method of automaticallyapplying hypothesis testing to a data set including generatingdefinitions of the null and alternative hypotheses in terms of the astatistical test, its underlying assumptions, so as to be mutuallyexclusive and exhaustive, and in terms indicated by the user as beingdescriptive of the processes and data of interest.

In yet another embodiment, the invention provides a method ofautomatically applying hypothesis testing to a data set includinggenerating test conclusions expressed in terms relating the quantitativeanalysis to terms indicated as describing the process, and providing thebasis for conclusions in terms describing the process.

Other features and advantages of the invention will become apparent tothose skilled in the art upon review of the following detaileddescription, claims, and drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram of a computer system for implementing asoftware program embodying the invention.

FIG. 2 is a schematic diagram of a logic tree for implementing thehypothesis testing effected by the software program.

FIG. 3 illustrates a user interface generated by the software.

FIG. 4 illustrates a user interface generated by the software.

FIG. 5 illustrates a user interface generated by the software.

FIG. 6 illustrates a user interface generated by the software.

FIG. 7 illustrates a user interface generated by the software.

FIG. 8 illustrates a user interface generated by the software.

FIG. 9 illustrates a user interface generated by the software.

FIG. 10 illustrates a user interface generated by the software.

FIG. 11 illustrates a user interface generated by the software.

FIG. 12 illustrates a user interface generated by the software.

FIG. 13 illustrates a user interface generated by the software.

FIG. 14 illustrates a user interface generated by the software.

FIG. 15 is similar to FIG. 2 and schematically illustrates a second setof statistical tests that can be performed using the software.

DETAILED DESCRIPTION

Before any aspects of the invention are explained in detail, it is to beunderstood that the invention is not limited in its application to thedetails of construction and the arrangement of components set forth inthe following description or illustrated in the following drawings. Theinvention is capable of other embodiments and of being practiced or ofbeing carried out in various ways. Also, it is to be understood that thephraseology and terminology used herein is for the purpose ofdescription and should not be regarded as limiting. The use of“including,” “comprising,” or “having” and variations thereof herein ismeant to encompass the items listed thereafter and equivalents thereofas well as additional items. The terms “connected,” “coupled,” and“mounted” and variations thereof herein are used broadly and, unlessotherwise stated, encompass both direct and indirect connections,couplings, and mountings. In addition, the terms connected and coupledand variations thereof herein are not restricted to physical andmechanical connections or couplings. As used herein the term “computer”is not limited to a device with a single processor, but may encompassmultiple computers linked in a system, computers with multipleprocessors, special purpose devices, computers or special purposedevices with various peripherals and input and output devices, softwareacting as a computer or server, and combinations of the above. Ingeneral, computers accept and process information or data according toinstructions (i.e., computer instructions).

The drawings illustrate a system for automatically applying hypothesistesting to one or more data sets having a variety of statisticallysignificant characteristics. Specifically, with reference initially toFIG. 1, the system includes a general purpose computer 10. The computer10 provides a platform for operating a software program that applieshypothesis testing to one or more data sets. In the system identified,data and program files are input to the computer 10, which reads thefiles and executes the programs therein. Some of the elements of thecomputer 10 include a processor 12 having an input/output (IO) section14, a central processing unit (CPU) 16, and a memory module 18. In oneform, the software program for applying hypothesis testing is loadedinto memory 18 and/or stored on a configured CD ROM (not shown) or otherstorage device (not shown). The IO section 14 is connected to keyboard20 and an optional user input device or mouse 22. The keyboard 20 andmouse 22 enable the user to control the computer 10. IO section 14 isalso connected to monitor 24. In operation, computer 10 generates theuser interfaces identified in FIGS. 3-14 and displays those userinterfaces on monitor 24. The computer also includes CD ROM drive 26 anddata storage unit 28 connected to IO section 14. In some embodiments,the software program for effecting hypothesis testing may reside onstorage unit 28 or in memory unit 18 rather than being accessed throughthe CD ROM drive using a CD ROM. Alternatively, CD ROM drive 26 may bereplaced or supplemented by a floppy drive unit, a tape drive unit, orother data storage device. The computer 10 also includes a networkinterface 30 connected to IO section 14. The network interface 30 can beused to connect the computer 10 to a local area network (LAN), wide arenetwork (WAN), internet based portal, or other network 32. Any suitableinterface can suffice, including both wired and wireless interfaces.Thus, the software may be accessed and run locally as from CD ROM drive26, data storage deice 28, or memory 18, or may be remotely accessedthrough network interface 30. In the networked embodiment, the softwarewould be stored remote from the computer 10 on a server or otherappropriate hardware platform or storage device.

The software program provides algorithms relating to a plurality ofstatistical tests that can be applied under a variety of circumstancesto the data sets. For example, the illustrated system provides thefollowing statistical tests: one proportion Z-test, one proportionbinomial test, two proportion Z-test, multi proportion Chi-square test,one mean Z-test, one mean t-test, two means Z-test, two sample t-test;F-test Anova, Chi square test, and an F-ratio test. FIG. 14 illustratesa second set of statistical tests, known as parametric tests: One SampleSign Test, Paired Samples Sign Test, One Sample Wilcoxon Signed RanksTest, Paired Samples Wilcoxon Signed Ranks Test, Mann Whitney WilcoxonTest, Kruskal Wallis Test, and The Friedman Test. The system of coursecould include other statistical tests useful for hypothesis testing.These statistical tests are useful when applied to data embodyingvarious characteristics. For example, some of the tests are useful whenapplied to attribute data while others are not. Similarly, some of thetests are useful when applied to data wherein the mean or location ofthe process from which the data is drawn is known, and others are not.Applying a test to a data set without understanding the assumptionsunderlying the test can generate erroneous or unfounded results.

The system also establishes conventions associated with each test of theplurality of tests. Among the conventions incorporated in the system isthe convention of stating the null hypothesis as an equality for thosetests wherein such a logical statement is appropriate. For example, thesystem avoids stating the null as being “greater than or equal to” areference value. Another convention adopted by the system is to statethe alternative hypothesis statement as an inequality, which logicallyfollows from the convention of defining the null hypothesis.

The system automatically determines the appropriate statistical test.The determination of the appropriate statistical test is madeautomatically in response to indications or choices made by the user inresponse to queries or prompts generated by the system. The systemdesign follows a logic map that forces the user to confront and affirmchoices regarding the data and information available to the user seekingto apply hypothesis testing to the data. Not only does the system drivethe user to application of the correct test, but it also informs theuser of the implications of the choices and consequences of makinginappropriate indications.

Initially, the system provides this determination process by seeking anindication as to whether the data set the user seeks to asses is timeordered. In response to this indication, the system generates aconfirmatory notification explaining the importance in hypothesistesting of process stability. More particularly, assuming the datasubjected to the hypothesis testing is randomly drawn from a process ofinterest, it is imperative that the process be stable. Otherwise, theresults drawn from the hypothesis test are not meaningful.

The test determining step also includes seeking an indication of thenature of data as being attribute data or continuous data, again becausesome of the statistical tests are useful with attribute data and someare not. In the illustrated system, if the indication is that the dataare attribute data, then the system further seeks an indication as tothe number of samples from which the data is drawn, an indication ofsample size, and seeks an indication of normality of the data. Likewise,in response to an indication that the data are continuous, the systemthen seeks an indication as to the number of samples from which the dataare drawn, an indication of sample size and seeks an indication as towhether the data are normal, not normal, or if normalcy is unknown.

The system, in determining the test, responds to the indications ofnormality. If the indication is that the data are either not normal orthe normality of the data is unknown, then the system provides aconfirmatory notification either to use a normality test to determinenormality, to use non-parametric tests or to use data transformationfunctions.

Determining the test also includes identifying a statistical parameterof interest. Identifying a statistical parameter of interest includesselecting a parameter of interest from among the following commonsstatistical parameters: proportion, mean, median, and variance of thedata.

Determining the test also includes seeking an indication of whether,depending on the number of samples indicated, the data sample includeseither paired data or differences between paired data samples. Likewise,if the parameter of interest is indicated as being the mean, then thesystem also seeks an indication of whether variance of population isknown.

Ultimately, the system automatically selects the appropriate statisticaltest from among the plurality of tests based on the indications andestablished conventions, and further provides a confirmatorynotification of the nature of the selected test, the indications andestablished conventions.

The system also automatically characterizes the data set by establishingtest criteria, selecting an appropriate reference test value dependingon the test selected; and eliciting an indication of a description ofthe data of interest. Specifically, the system prompts the user toidentify values for the statistic of interest, e.g., proportion,variance or mean. The system confirms the value and will prompt the userif inappropriate values are indicated. For example, the system willadvise the user that the value of a population proportion must liebetween zero and one. Likewise, the system prompts the user to providedescriptions of the data, e.g., names for the methods or treatmentssubjected to the hypothesis test. As described below, these indications,provided in the user's own language or terms, are used in confirmatorynotifications, construction of the null and alternative hypotheses, andin explaining and interpreting conclusions drawn from the hypothesistest.

The system also automatically constructs the null and alternativehypothesis statements based in large part on the test selected, the datacharacterizations as indicated by the user, and according to the variousconventions associated with the tests and data. Defining the nullhypothesis includes generating a confirmation of the indications made bythe user and the implications of the chosen fields. The system alsoprovides a confirmatory notification of the null hypothesis statement.In one embodiment, the null hypothesis statement is made in terms of anequality. The system likewise automatically constructs the alternativehypothesis statement based on the selected test and assumed conventionsrelating to the selected test and indications of the test criteria andpopulation description. The system provides a confirmatory notificationof the alternative hypothesis statement and the implications of thechoices made by the user.

The system also seeks an indication of the desired significance level tobe applied to the hypothesis test, and describes the implications of thechoice of significance level in hypothesis testing. The system thenautomatically conducts the selected test and generates an output.Preferably, the output is in graphical and numeric form, and includestext using the terms provided by the user in describing the data.

In this regard, the system generates an output including calculations ofthe values of the test statistic, calculating cut-off values, confidenceintervals, and calculating p-values; comparing the calculated p-value tothe indicated significance level, comparing the value of the teststatistic to one or more of the reference values, the cut-off values orconfidence intervals in view of the null hypothesis statement. Thesystem formulates and expresses the conclusion in terms of the selectedtest, the indicated test criteria and population descriptions, in termsindicated by the user, as to whether to reject the null hypothesis ornot to reject the null hypothesis, and also states the basis for theconclusion. By using the terms supplied by the user and explaining theconclusion using both the indicated terms and the automaticallycalculated values of the test statistic, the system provides a tool forusing hypothesis testing that reduces the likelihood or errors occurringthough misunderstanding predicate assumptions of the tests, flawednull/alternative hypothesis statements and misinterpretation of the testresults.

The system is preferably in the form of computer-readable modulescapable of providing the functionalities of the system. Those of skillin the art will also readily recognize that the description anddisclosure of the system herein also describes and discloses a methodfor automatically applying hypothesis testing to a data set. While thereare many possible embodiments of the software program, one commerciallyavailable embodiment is the Engine Room® data analysis software providedby Moresteam.com., and which can be purchased online atwww.moresteam.com/engineroom.

Various other features and advantages of the invention are set forth inthe following claims.

What is claimed is:
 1. A method of selecting a hypothesis test to beapplied to at least one data set using a display, an input device, and aprocessor, the method comprising: providing on the display adescription, understandable by a user unfamiliar with statisticalanalysis, of a plurality of data types; providing on the display adescription, understandable by a user unfamiliar with statisticalanalysis, of a plurality of statistical parameters of interest;providing on the display a description, understandable by a userunfamiliar with statistical analysis, of a plurality of sample sizes;receiving from an input device an indication of the data type of the oneor more data sets; receiving from the input device an indication ofwhich of the plurality of statistical parameters of interest is to betested; receiving from the input device an indication of the sample sizeof the one or more data sets; selecting a test to execute based onreceived indications; and executing, by the processor, the selectedtest.
 2. The method of claim 1, wherein the processor is remote from thedisplay and the input device.
 3. The method of claim 1, wherein theprocessor, display, and input device are incorporated in a computerlocal to the user.
 4. The method of claim 1, wherein a remote computingdevice provides the descriptions and the results of the test to thedisplay, receives the indications from the input device, and executesthe test.
 5. A method of automatic analysis of at least one data set byapplying a hypothesis test to find differences in the at least one dataset between a sample and a target, between two samples or betweenmultiple samples using a display, an input device, and a processor, themethod comprising: providing on a non-transitory computer readablemedium a plurality of statistical tests applicable to the at least onedata set; providing on the display a description, understandable by auser unfamiliar with statistical analysis, of a plurality of data typesand a plurality of statistical parameters of interest; receiving fromthe input device an indication of a data type and a statisticalparameter of interest; using the processor to select a test from amongthe plurality of tests based on the indications; providing on thedisplay a description, understandable by a user unfamiliar withstatistical analysis, of the selected test; receiving from the inputdevice indications of a plurality of parameters of the test; using theprocessor to execute the selected test; and providing on the display adescription, understandable by a user unfamiliar with statisticalanalysis, of the results of the test.
 6. The method of claim 5, whereinthe processor is remote from the display and the input device.
 7. Themethod of claim 5, wherein the processor, display, and input device areincorporated in a computer local to the user.